Pii: S0306-4573(98)00040-5

نویسنده

  • Myoung-Cheol Kim
چکیده

In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are di€erent in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong in ̄uence on the retrieval performance. In our experiments of query expansion using these ®ve similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval e€ectiveness, whereas, NMI and ACP are better in terms of execution eciency. # 1999 Elsevier Science Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Browsing is a collaborative process

– Interfaces to databases have traditionally been designed as single-user systems that hide other users and their activity. This paper aims to show that collaboration is an important aspect of searching online information stores that requires explicit computerised support. The claim is made that a truly user-centred system must acknowledge and support collaborative interactions between users. C...

متن کامل

Automatic performance evaluation of Web search engines

Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used ...

متن کامل

Crossover Improvement for the Genetic Algorithm in Information Retrieval

Genetic algorithms (GAs) search for good solutions to a problem by operations inspired from the natural selection of living beings. Among their many uses, we can count information retrieval (IR). In this field, the aim of the GA is to help an IR system to find, in a huge documents text collection, a good reply to a query expressed by the user. The analysis of phenomena seen during the implement...

متن کامل

Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries

This paper describes the development and application of visualisation techniques for users to access and explore information in a digital library e€ectively and intuitively. Salient semantic structures and citation patterns are extracted from several collections of documents, including the ACM SIGCHI Conference Proceedings (1995±1997) and ACM Hypertext Conference Proceedings (1987±1998), using ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998